Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

نویسنده

  • Martin A. Riedmiller
چکیده

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Play the Worker-Placement Game Euphoria using Neural Fitted Q Iteration

We design and implement an agent for the popular worker placement and resource management game Euphoria using Neural Fitted Q Iteration (NFQ), a reinforcement learning algorithm that uses an artificial neural network for the action-value function which is updated off-line considering a sequence of training experiences rather than online as in typical Q-learning. We find that the agent is able t...

متن کامل

Deep Reinforcement Learning with Regularized Convolutional Neural Fitted Q Iteration

We review the deep reinforcement learning setting, in which an agent receiving high-dimensional input from an environment learns a control policy without supervision using multilayer neural networks. We then extend the Neural Fitted Q Iteration value-based reinforcement learning algorithm (Riedmiller et al) by introducing a novel variation which we call Regularized Convolutional Neural Fitted Q...

متن کامل

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long ShortTerm Memory [7], Gated Recurrent Unit [3] and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures [8]. A variant of fitted Q iteration, based on A...

متن کامل

Q-Batch: initial results with a novel update rule for Batch Reinforcement Learning

Batch Reinforcement Learning has established itself as a valuable alternative to develop learning and adaptive agents. Batch Reinforcement Learning algorithms are characterized by obtaining a policy from a set of collected data. Common methods apply adapted versions of RL update rules, such as QLearning, on the transitions of the batch, building a pattern set. The target values of the pattern r...

متن کامل

Deep Belief Nets as Function Approximators for Reinforcement Learning

We describe a continuous state/action reinforcement learning method which uses deep belief networks (DBNs) in conjunction with a value function-based reinforcement learning algorithm to learn effective control policies. Our approach is to first learn a model of the state-action space from data in an unsupervised pretraining phase, and then use neural-fitted Q-iteration (NFQ) to learn an accurat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005